About the data/article in a nutshell: Tom uses a combination of sources to make his point → while not a bottom ‘performer’; especially compared to the US, UK’s National Health Service (NHS) did not wait for the COVID crisis to show its limits and harbor increasing consumer dissatisfaction. That downward trend can be traced back to the early 2010’s; and that comes from the service running lean - too lean, among other reasons. For more detail on the sources, feel free to visit the actual article with link embedded at the top of the page in to author’s name. If you like to download the data, feel free to use urls here.
Overall Strategy for building first plot: The data for first plot is smoothed out/interpolated data; this is usually done to ‘well smooth’ the data and delineate clearer trends over time; in this case select European countries’ survey takers’ scores on their overall satisfaction level towards their respective countries’ public health services (ESS - European Social Survey). In this case, there is an additional reason the author smoothed the data; it is to ‘complete’ the years given that the ESS is done once every two years; note that I am assuming here but it’s not an outrageous assumption to make. So in order to match that smoothing, I go with geom_smooth() from ggplot2 and keep span at default; after a few iterations; the data points (for non survey years) match highly to what Tom displays in the first graph. Finally, since the graph is interactive, I use ggiraph package to emulate said interactivity; a JS based R package that lets you add tooltips/hover/highlight upon hover/downplay non-hovered, etc. all the usual things one expects from an interactive plot; without having to build a Shiny app; which for this exercise/first plot (and the rest); would be like building a Gatling gun to aim at an ant.
Steps taken: While I try to be as detailed in my comments as possible; it’s still helpful to lay out the step by step process as a numbered list to get the overall chain of what the code is supposed to do on a high level - without having to go into the nitty gritty- the comments in the code chunks and in that regard should hopefully help:
Found the source of the data from HTML Source Page; clicked on Network tab after hovering on the plot panel; refreshed the page; and found ‘dataset’ under ‘datawrapper’.
The data was wide in structure (from raw csv): 17 columns (1 column for year, 8 hex-coded columns with imputed/smoothed values, and 8 columns for country abbreviations with survey data for even years, NULL otherwise).
Discovered the hex columns and country columns didn’t align in a straightforward way. Columns were randomly ordered within each set, requiring a ranking approach rather than pairwise matching (one hex column to the symmetrical position of the country labeled column).
Implemented a solution by sorting satisfaction scores per year, which helped group values by country through proximity of their scores. This approach works well since the values are interpolated through smoothing, making them very close to one another from row to row. This might not work in other cases, but it does here.
Combined two sorted datasets: year + hex columns, and year + country columns to create a properly aligned mapping, joined facts data (with scores/values) on newly created mapped long datasets (converted from wide- almost always much harder to work with) to then finalize the dataset for visualization. More detail on ‘finalized the dataset’ can be found in the comments of the actual code. Prepared the final clean dataset for interactive plotting with ggiraph.
Health Service Satisfaction
Show the code
#|echo: false#|message: false#|warning: false#|include: falseinvisible({# install pacman if it's not already installedif(!requireNamespace("pacman", quietly =TRUE))install.packages("pacman")# install.packages("gdtools", type = "source")## NOTE; potential (might not be needed) steps on mac for registering then loading Roboto font into Quarto below## from terminal/shell# brew install cairo fontconfig freetype pkg-config# export PKG_CONFIG_PATH="/opt/homebrew/lib/pkgconfig:/opt/homebrew/share/pkgconfig"# export PKG_CFLAGS="-I/opt/homebrew/include"# export PKG_LIBS="-L/opt/homebrew/lib"## then from Rstudio# install.packages("gdtools", type = "source")pacman::p_load(xml2,downlit,gdtools,tidyverse,quarto,chromote,here,tidycensus,janitor,purrr,ggtext,ggshadow,ggiraph,gfonts,showtext,ggborderline,shiny,gt,rsvg,magick,stringr,ggimage)showtext::showtext_auto(enable =TRUE)})# loading in the fonts from venv created that has the Roboto ones (for ploting); a widespread font used by The Times, The Economist, etc.tryCatch({sysfonts::font_add(family ="roboto-regular", regular ="~/Library/Fonts/Roboto-Regular.ttf")sysfonts::font_add(family ="roboto-bold", regular ="~/Library/Fonts/Roboto-Bold.ttf")sysfonts::font_add(family ="roboto-condensed", regular ="~/Library/Fonts/Roboto_Condensed-Regular.ttf")sysfonts::font_add(family ="roboto-condensed-bold", regular ="~/Library/Fonts/Roboto_Condensed-Bold.ttf")}, error =function(e){message("Fonts not available locally: ", e$message)})# used for most things within ggplot so we can assign them as default themes settheme_set(theme_minimal()+theme( text =element_text(family ="roboto-regular", size =11), plot.title =element_text(family ="roboto-bold", size =16), plot.subtitle =element_text(family ="roboto-regular", size =12), axis.text =element_text(family ="roboto-regular", size =10), axis.title =element_text(family ="roboto-condensed", size =11)))
While above took care of importing required libraries and setting general options such as plot theme and text font to be used; below is the start of data related tasks; from initial pull, to wrangling, to finally output the visualizations.
Data Pull
Show the code
#|echo: false#|message: false#|warning: false#|include: false#|eval: falseinvisible({b<-ChromoteSession$new()b$Page$navigate("https://www.thetimes.com/comment/columnists/article/we-keep-pumping-money-into-the-nhs-is-it-good-value-blq8bxc39")Sys.sleep(3)# allow some time for dynamic content to render})# extract all iframe srcs (joined by || in this case)iframes_html<-b$Runtime$evaluate("Array.from(document.querySelectorAll('iframe')).map(el => el.src).join('||')")$result$value# split and filter valid Datawrapper url'schart_urls<-str_split(iframes_html, "\\|\\|")[[1]]|>str_subset("^https://datawrapper\\.dwcdn\\.net/[a-zA-Z0-9]+/\\d+$")all_data<-purrr::map_dfr(chart_urls, function(url){message("Navigating to: ", url)b$Page$navigate(url)Sys.sleep(3)html<-b$Runtime$evaluate("document.documentElement.outerHTML")$result$value# match visible chart values if anypattern<-'aria-datavariables="year,\\s*([A-Z]+)".*?aria-datavalues="([0-9]{4}),\\s*([0-9.]+)"'matches<-str_match_all(html, pattern)[[1]]# match dataset.csv url as well csv_pattern<-"https://datawrapper\\.dwcdn\\.net/[a-zA-Z0-9]+/\\d+/dataset\\.csv"csv_link<-str_extract(html, csv_pattern)if(is.na(csv_link)){csv_link<-str_glue("{url}/dataset.csv")}tibble( chart_url =url, country =if(nrow(matches))matches[, 2]elseNA, year =if(nrow(matches))as.integer(matches[, 3])elseNA, value =if(nrow(matches))as.numeric(matches[, 4])elseNA, dataset_csv =csv_link)})# add a custom gt boilerplate -from {gt} package (great tables)-to reduce code redundancy (having to copy/paste same chunks of code every # time we turn a tibble into a gt object)gt_nyt_custom<-function(x, title='', subtitle='', first_10_rows_only=TRUE){x<-x|>clean_names(case ='title')numeric_cols<-x|>select(where(is.double))|>names()integer_cols<-x|>select(where(is.integer))|>names()title_fmt<-if(title!="")glue::glue("**{title}**")else""subtitle_fmt<-if(subtitle!="")glue::glue("*{subtitle}*")else""x|>(\(x)if(first_10_rows_only)slice_head(x, n =10)elsex)()|>gt()|>tab_header( title =md(title_fmt), subtitle =md(subtitle_fmt))|>tab_style( style =list(cell_text(color ='#333333')), locations =cells_body())|>tab_style( style =list(cell_text(color ='#CC6600', weight ='bold')), locations =cells_column_labels(everything()))|>fmt_number( columns =c(numeric_cols), decimals =1)|>fmt_number( columns =c(integer_cols), decimals =0)|>tab_options( table.font.names =c("Merriweather", "Georgia", "serif"), table.font.size =14, heading.title.font.size =18, heading.subtitle.font.size =14, column_labels.font.weight ="bold", column_labels.background.color ="#eeeeee", table.border.top.color ="#dddddd", table.border.bottom.color ="#dddddd", data_row.padding =px(6), row.striping.include_table_body =TRUE, row.striping.background_color ="#f9f9f9")}
Metadata
Show the code
#|echo: false#|message: false#|warning: false# reveal dataset urls/csvsall_data|>count( url =chart_url, download_link =dataset_csv)|>select(-n)|>gt_nyt_custom( title ='Dataset Ids')|>cols_label( Url ="Plot URL", `Download Link` ="Link to CSV")|>tab_footnote("In the event you download the links yourself and run your own script, the third and last should be treated as tsv files, otherwise csv's")
Dataset Ids
Plot URL
Link to CSV
https://datawrapper.dwcdn.net/7NJRB/1
https://datawrapper.dwcdn.net/7NJRB/1/dataset.csv
https://datawrapper.dwcdn.net/Bxhol/4
https://datawrapper.dwcdn.net/Bxhol/4/dataset.csv
https://datawrapper.dwcdn.net/JH3Qn/1
https://datawrapper.dwcdn.net/JH3Qn/1/dataset.csv
https://datawrapper.dwcdn.net/Mc3q2/2
https://datawrapper.dwcdn.net/Mc3q2/2/dataset.csv
https://datawrapper.dwcdn.net/eXQPs/1
https://datawrapper.dwcdn.net/eXQPs/1/dataset.csv
In the event you download the links yourself and run your own script, the third and last should be treated as tsv files, otherwise csv's
Data Sample
Show the code
# reveal data sample for year 2004 as an exampleall_data|>filter(year==2004)|>select(2:last_col())|>gt_nyt_custom()|>tab_header( title =md("**Chart Data Summary**"), subtitle =md("*Extracted from embedded datawrapper in the HTML*"))
Chart Data Summary
Extracted from embedded datawrapper in the HTML
Country
Year
Value
Dataset Csv
DE
2,004
4.7
https://datawrapper.dwcdn.net/7NJRB/1/dataset.csv
PT
2,004
3.5
https://datawrapper.dwcdn.net/7NJRB/1/dataset.csv
IE
2,004
4.1
https://datawrapper.dwcdn.net/7NJRB/1/dataset.csv
NO
2,004
5.7
https://datawrapper.dwcdn.net/7NJRB/1/dataset.csv
FR
2,004
5.8
https://datawrapper.dwcdn.net/7NJRB/1/dataset.csv
ES
2,004
5.8
https://datawrapper.dwcdn.net/7NJRB/1/dataset.csv
GB
2,004
5.4
https://datawrapper.dwcdn.net/7NJRB/1/dataset.csv
Data Wrangling
Show the code
#|echo: false#|message: false#|warning: false# for easier referencing, assign file types (csv where appropriate, tsv otherwise) file_info<-tibble( path =unique(all_data$dataset_csv), name =c("health_service_sat:csv", "value_for_money:csv", "room_to_improve:tsv", "barely_beds:csv", "budget_breakdowns:tsv"))|>separate(name, into =c('dataset_name', 'file_type'), sep ="\\:")# loop thru datasets, read them in, and then assign them to the global environmentinvisible({file_info|>mutate( data =pmap(list(path, file_type), \(path, file_type)if(file_type=="csv")read_csv(path)elseread_tsv(path)))|>select(dataset_name, data)|>deframe()|>list2env(envir =.GlobalEnv)})# set country 'switch'; so that tooltip can reflect full country name (spelled out) accordingly for imputed values (non survey years smoothed values)country_labels<-c( NO ="Norway", DE ="Germany", ES ="Spain", FR ="France", GB ="UK", IE ="Ireland", PT ="Portugal")country_label_tibble<-c( NO ="Norway", DE ="Germany", ES ="Spain", FR ="France", GB ="UK", IE ="Ireland", PT ="Portugal")|>enframe()# clean the first dataset: health_service_sat, to prep for plotting.# it has 17 columns: 1 year column, 8 hex-coded columns (imputed/smoothed values), # and 8 columns for country abbreviations (survey data, even years).# initially assumed hex columns align pairwise with country columns; but it was not the case.s# columns are randomly ordered within each set, so we use ranking instead.# sorting by satisfaction score per year helps group values by country (via proximity of their scores).# we then combine two sorted datasets: year + hex columns, and year + country columns# note that this might not always be the go-to solution but in this case, # and given that the values are interpolated (through smoothing), we can safely bet that the values will be very close to one another# from one row to the nexthex_to_country_mapping<-health_service_sat|>pivot_longer(-year)|>slice_max(year)|>filter(str_starts(name, '\\#')&!str_detect(name, 'A9FF')# looking at last values from article curves, we can infer this is Italy so )|>arrange(value)|>bind_cols(health_service_sat|>pivot_longer(-year)|>slice_max(year)|>filter(!str_starts(name, '\\#')&!str_detect(name, 'IT'))|>arrange(value))|>select( years =1, hex_code =2, second_to_last_val =3, years_max =4, country_abb =5, last_val =6)|>mutate( val_diff =abs(last_val-second_to_last_val))|>arrange(val_diff)|>select(hex_code, country_abb)|># also join to country_label_tibble to get full country names for future useinner_join(country_label_tibble, join_by(country_abb==name))# now we can map the randomly assigned hex value labels to the actual columns/countries, and create 8 series,# one for each countryhealth_service_sat<-health_service_sat|>pivot_longer(-year)|>left_join(hex_to_country_mapping, join_by(name==hex_code))|>mutate( country_abb =coalesce(country_abb, name))|>inner_join(hex_to_country_mapping, join_by(country_abb==country_abb))|>select(year, country_abb, country =value, value =value.x)|>drop_na()extract_smooth_build<-function(tibble, country='GB'){initial_pull<-all_data|>filter(country%in%{{country}})|>ggplot(aes(x =year, y =value))+geom_smooth(method ='loess')# fetch country abbs for ids, and rangescountry_ids<-c(na.omit(all_data|>pull(country)|>unique()))country_max<-all_data|>filter(country=={{country}})|>pull(value)|>max()country_min<-all_data|>filter(country=={{country}})|>pull(value)|>min()# access smoothed, include actual years to imputed/smoothed points, cap at min max per country/series# and keep only columns of interestsmoothed_df<-ggplot_build(initial_pull)[[1]]|>as.data.frame()|>as_tibble()complete_series<-smoothed_df|>select(year =x, value =y)|>mutate(country:=country)|>bind_rows(all_data|>filter(country=={{country}})|>select(year, value))|>mutate( year =as.integer(year), year_val_tie_breaker =if_else(is.na(country), 1, 0))|>group_by(country, year)|>arrange(desc(year_val_tie_breaker))|>mutate(ties =row_number())|>filter(if(n()<4)TRUEelseties+year_val_tie_breaker!=1# make sure every year/country combo gets same no. of obs# and that original values (only in the event a given year is even or survey year) take precedence over smoothed ones# otherwise just pass/do nothing)|>ungroup()|># ensuer smoothed values don't go below/beyond lower/upper boundsmutate( value =pmin(pmax(value, country_min), country_max))|>arrange(year)|>fill(country, .direction ='downup')|># since every year starts with select(year, country, value)return(complete_series)}# country vector to loop thrucountry_name_abbs<-c(na.omit(all_data|>pull(country)|>unique()))# combine all seriesall_series<-map_dfr(.x =country_name_abbs, ~extract_smooth_build(tibble =all_data, country =.x))# set contry 'switch; so that tooltip can change accordingly for odd numebred yearscountry_labels<-c( NO ="Norway", DE ="Germany", ES ="Spain", FR ="France", GB ="UK", IE ="Ireland", PT ="Portugal")# adding year as continuous variable (decimal years) so that points don't overlap but strech over whithin a year to year spanall_series<-all_series|>mutate( rn =row_number(), .by =c(country, year))|>mutate( decimal_year =if_else(rn==1, year, year+rn/8))|>mutate( year =decimal_year)|>select(-decimal_year)# also join on country full name mapping so we can generate a consolidate data_id that links the aestethics together (for interactive simultaneus highlighting, etc.)all_series<-all_series|>inner_join(country_labels|>enframe()|>rename(values =value), join_by(country==name))|>mutate( data_id =str_c(country, values))|>select(-values)|>mutate( country_name =str_sub(data_id, 3, 20))# also generate visible (and non visible years by exclusion) as they don't visually get the same properties; visible (even numbered years) get the country abb as a tooltip (and larger markers/circles), # while 'invisible' ones (odd numbered years along with year 2023) get their country names fully spelled out and get transparent marker/circle fillvisible_years<-c(seq(2002, 2022, 2), 2023)visible_points<-all_series|>filter(round(year)%in%visible_years&floor(year)==ceiling(year))invisible_points<-all_series|>filter(!round(year)%in%visible_years&floor(year)!=ceiling(year))# final touchups# set color mappingscolor_map<-expr(case_when(country%in%c('NO', 'Norway')~'#d43b45',country%in%c('DE', 'Germany')~'#DCA825',country%in%c('ES', 'Spain')~'#b01622',country%in%c('FR', 'France')~'#487caa',country%in%c('GB', 'UK')~'#264250',country%in%c('IE', 'Ireland')~'#61A861',country%in%c('PT', 'Portugal')~'#d27e4e',TRUE~'#000000'))# set tooltip mappingstooltip_map<-expr(case_when(!year%in%c(seq(2002, 2022, 2), 2023)&country%in%names(country_labels)~country_labels[country],TRUE~country))label_data<-all_series|>group_by(country)|>arrange(desc(year))|>filter(row_number()==1)|>mutate( y_offset =case_when(country=='ES'~value+.1,country=='FR'~value+0,country=='DE'~value-.05,country=='GB'~value-.1,country=='PT'~value+.2,TRUE~value))|>ungroup()|>mutate( country_name =case_when(country=="DE"~"Germany",country=="ES"~"Spain",country=="FR"~"France",country=="GB"~"UK",country=="IE"~"Ireland",country=="NO"~"Norway",country=="PT"~"Portugal",TRUE~NA_character_), country_color =case_when(country%in%c("DE", 'Germany')|country_name%in%'Germany'~"#9b6e00", # override DE/Germany label color here since curve color is different than country label color (only one)country%in%c('NO', 'Norway')~'#d43b45',country%in%c('ES', 'Spain')~'#b01622',country%in%c('FR', 'France')~'#487caa',country%in%c('GB', 'UK')~'#264250',country%in%c('IE', 'Ireland')~'#61A861',country%in%c('PT', 'Portugal')~'#d27e4e',TRUE~'#000000'))|>inner_join(country_labels|>enframe()|>rename(values =value), join_by(country==name))|>mutate( data_id =str_c(country, values), country =if_else(country=='DE', 'Germany', country))# add caption to match Tom'scaption_text<-"<span style='color:#232323;'>0 = extremely bad, 10 = extremely good</span><br><span style='color:#939291; font-weight: normal;'>Chart: Tom Calver | The Times and The Sunday Times • Source: ESS/K. Kardous</span>"p<-all_series|>distinct()|>ggplot(aes(x =year, y =value, group =data_id, color =country))+scale_color_manual( values =c('NO'="#d43b45",'DE'='#DCA825','ES'='#b01622','FR'='#487caa','GB'='#264250','IE'='#61A861','PT'='#d27e4e'))+scale_y_continuous( breaks =seq(0, 7, 1), limits =c(0, 8))+scale_x_continuous( breaks =seq(2002, 2022, 2), limits =c(2002, 2025), expand =c(0, 0.1))+theme( legend.position ='none', panel.grid.major.x =element_blank(), panel.grid.minor.x =element_blank())+geom_smooth_interactive( data =all_series,aes(x =year, y =value, data_id =paste0(country, country_name)), method ="loess", se =FALSE, linewidth =3.5, # thick line acts as the 'border' alpha =1, show.legend =FALSE, color ="white")+# colored interactive smooth linegeom_smooth_interactive( data =all_series|>filter(!country%in%'IE'),aes(data_id =paste0(country, country_name)), method ="loess", se =FALSE, linewidth =0.9, fill =NA, show.legend =FALSE)+geom_smooth_interactive( data =all_series|>filter(country%in%'IE'),aes(data_id =paste0(country, country_name)), method ="loess", se =FALSE, linewidth =0.9, fill =NA, show.legend =FALSE)+scale_y_continuous( breaks =seq(0, 7, 1), limits =c(0, 8))+scale_x_continuous( breaks =seq(2002, 2022, 2), limits =c(2002, 2025), expand =c(0, 0.1))+labs( x =NULL, y =NULL, caption =caption_text)+# final touchoups before interactive rendering thru girafe()theme( panel.spacing =unit(20, 'cm'), plot.margin =margin(l =5, b =10), # leave some space/margin at the bottom for caption 'room to breathe' axis.text =element_text(face ="bold"), # axis tick labels strip.text =element_text(face ="bold"), # facet labels panel.grid.major.x =element_blank(), axis.text.x =element_text(margin =margin(b =10, t =-10)), panel.grid.major.y =element_line(color ="gray90"), axis.ticks.x =element_blank(), plot.caption =element_markdown( family ="Roboto", face ='bold'))p_interactive<-p+geom_point_interactive( data =visible_points,aes( x =year, y =value, color =country, data_id =paste0(country, country_name)), alpha =0.1, fill ='white', show.legend =FALSE)+geom_point_interactive( data =all_series|>mutate( point_size =if_else(country%in%c('NO', 'Norway', 'PT', 'Portugal'), 3, 1.5), point_stroke =point_size),aes( x =year, y =value, data_id =paste0(country, country_name), tooltip =paste0("<div style='text-align:", if_else(year<=2015.250, "left", "right"), "; line-height: 1.1;'>", # tightens spacing"<div style='font-weight:bold; font-size:16px; color:",if_else(country_name=="Germany", "#9b6e00", eval(color_map)), ";'>", eval(tooltip_map), "</div>","<div style='font-size:16px;'>", round(year, 0), "</div>","<div style='font-size:16px;'>", round(value, 2), "</div>","</div>")), color ='white', fill ='white', shape =21, alpha =0)+geom_rect( inherit.aes =FALSE,aes(xmin =2024, xmax =Inf, ymin =-Inf, ymax =Inf), color =NA, fill ="white")+scale_color_manual( breaks =c("GB", "FR", "IE", "PT", "ES", "NO", 'DE', 'Germany'), # this makes sure Germany as a country label gets dark yellow while 'DE' as a curve gets a darker yellow color values =c("#264250", "#487caa", "#61A861", "#d27e4e", "#b01622", "#d43b45", '#DCA825', '#9b6e00')# this makes sure Germany as a country label gets dark yellow while 'DE' as a curve gets a darker yellow color)+# scale_color_identity() + # correctly apply the country color to the label's font# coord_cartesian(xlim = c(2002, 2024.5)) +theme( panel.grid.major.x =element_blank(), panel.grid.minor.x =element_blank())+# add persistent white circle that follows mouse (via selection); one to cover all data points is simply assigning data to country.year combogeom_point_interactive( data =all_series,aes( x =year, y =value, group =paste0(year, country_name)), shape =21, size =0.4, stroke =1, fill ='white', color ="grey85", alpha =0, show.legend =FALSE)
Notes on Above Plot: This graph looks deceptively simple at first glance; but was in fact by far the hardest to replicate out of all five.
Hallow Circles/Markers: I’m not sure what software Tom uses to render the interactive plots, but it was much harder than expected to replicate the persistent hollow circle marker that moves along each curve and ‘links’ the tooltip to the marker using a small vertical tick. It wasn’t for a lack of trying but I believe current ggiraph framework (I might be wrong) doesn’t natively support said functionality.
White Glow around Curve Borders: Another feature I wanted to implement/match with Tom’s original is the subtle white glow around the curves. Two functions (at least) derived from packages allow for this; geom_glowline() & geom_borderline() from the ggshadow & ggborderline packages, respectively; both of which ‘get disabled’ when being called in a ggiraph framework.
Hovering on a Curve vs. a Point: Upon hovering on a curve, original attenuates the points that make up the curve to a maximum, to only display the curve (devoid of any points shown); mine, while making other curves go far in the background, doesn’t completely eliminate the points that make up the said hovered curve because both points and curve roll up to to the same data_id; an added layer from the ggplot2 extension in ggiraph which links in this case said points to the curve (and country labels together). There might be a way for that uncoupling (between points and their curves) to happen after the fact, after the hover; but after trying arduously, I did not find it. Feel free to reach out or do a pull request to suggest an improvement (here or otherwise). Having said all that, I believe the rest remains faithful to the original throughout.
Value for Money
Notes on Below Plot:
Show the code
#|echo: false#|message: false#|warning: false#|include: falsedata<-value_for_money|>drop_na(country)|>filter(!country%in%c('Norway', 'Australia'))|>mutate( last_year =year==2023, country_tooltip =paste(country, year, sep =', '))|>arrange(country, year)|>mutate( country_fill =case_when(str_detect(country_tooltip, "US")~"US",str_detect(country_tooltip, "France")~"France",str_detect(country_tooltip, "Italy")~"Italy",str_detect(country_tooltip, "Germany")~"Germany",str_detect(country_tooltip, "Canada")~"Canada",str_detect(country_tooltip, "Japan")~"Japan",str_detect(country_tooltip, "UK")~"UK",TRUE~country_tooltip), country =factor(country, levels =c("Canada", "France", "Germany", "Japan", "Italy", "UK", "US")))p2<-data|>ggplot(aes(x =spend, y =le, color =last_year, fill =country_fill, group =country_fill))+geom_point_interactive(aes(size =size, data_id =country_fill, tooltip =country_tooltip), shape =21, alpha =1)+geom_text_interactive( data =data|>slice_max(year)|>distinct(country_fill, .keep_all =TRUE),aes( text =country_fill, label =country_fill, data_id =country_fill, tooltip =country_tooltip), hjust =-0.3, vjust =0, alpha =1)+scale_fill_manual( breaks =c("US", "France", "Italy", "Germany", "Canada", "Japan", "UK"), values =c("#4076A4", "#80B1E2", "#61A961", "#F5C55E", "#FFAEA9", "#DACFC0", "#E94F55"))+scale_color_manual( breaks =c(FALSE, TRUE), values =c('white', 'black'))+theme( plot.title =element_markdown(size =12, lineheight =1.2, linewidth =1.5), plot.subtitle =element_markdown(size =12, lineheight =1.2))+labs( title ='**Value for money**', subtitle ="How life expectancy and per-capita healthcare spend have changed since 2000.<br> <span style='background-color:#e94f55; color:white; padding:2px 4px;'>**UK**</span> spending is rising, but life expectancy has stalled.")+labs(x =NULL, y =NULL)+scale_x_continuous( breaks =seq(3000, 11000, 1000), labels =c(format(seq(3000, 10000, 1000), big.mark =",", trim =TRUE), "$11,000"))+coord_cartesian( xlim =c(2100, 11300), ylim =c(77, 86), expand =FALSE, clip ='off')+# add caption for p2labs( caption ="<span style='color:#232323; font-weight:bold;'>In US Dollars, adjusted for purchasing power and inflation. Excludes 2020-22.</span> <br><span style='color:#989799; font-weight:bold;'>Chart: Tom Calver | The Times and The Sunday Times</span>")+theme( text =element_text(family ='Roboto', color ='black', face ='bold'), panel.grid.minor =element_blank(), panel.grid.major.y =element_blank(), panel.grid.major =element_line(size =0.3, color ="grey80"), axis.line =element_line(color ="black", size =0.3), legend.position ='none', plot.caption =element_markdown( size =10, hjust =0, lineheight =1.2, face ='plain'))+annotate( geom ='rect', xmin =2075, xmax =2345, ymax =86.5, ymin =86.15, fill ='#e94f55')+# we also need to annotate the years 2000 and 2023 with Germany's yellow hex code (to match what Tom has)# not so much for Germany but for reference in general to the range of years for the plot# 2000 persistent text geom; for 2023 we use text geom; for 2000, we use label with no borders to bring forward '2000'annotate( geom ='label', label ='2000', x =4250, y =77.97, color ='#F5C55E', fill ='white', label.size =NA, fontface ="bold")+# 2023 persistent text geomannotate( geom ='text', label ='2023', x =6400, y =81.2, color ='#F5C55E', fontface ="bold")+# add x and y axes titles (within the plot itself)# y axisannotate( geom ='text', label ='Life expectancy', x =2685, y =85.8, color ='#7B7B7B', fontface ="bold", fontfamily ='Roboto', fontsize =15)+# x axis; i couldn't get the text to right justify for x axis title 'Per Capita\n spend' even after using hjust = 1, so i split that text in two lines and that works/matches Tom's annotate( geom ='text', label ='Per-capita', x =11200, y =77.5, color ='#7B7B7B', fontface ="bold", fontfamily ='Roboto', fontsize =15, hjust =1, vjust =.6)+annotate( geom ='text', label ='spend', x =11200, y =77.2, color ='#7B7B7B', fontface ="bold", fontfamily ='Roboto', fontsize =15, hjust =1, vjust =.6)girafe( ggobj =p2, width_svg =10, height_svg =6, options =list(opts_tooltip( css ="background: white; border: 1px solid #ddd; border-radius: 4px; padding: 6px; font-size: 14px; font-family: Roboto; font-weight: bold; color: #232323; text-align: left; box-shadow: 2px 2px 5px rgba(0, 0, 0, 0.1);"),opts_hover( css ="stroke-opacity: 1; fill-opacity: 1; color: #232323; font-size: 12px; alpha: 1;"),opts_hover_inv( css ="fill-opacity: 0.01; stroke-opacity: 0.01; color: transparent; font-size: 0.1px;")))
---title: | <div class="custom-title-block"> <span style="color:#000000; font-size:1em;">Replication of below article's Data and Visualizations</span><br> <span style="color:#333333; font-size:0.7em;">"We keep pumping money into the NHS. Is it good value?</span><br> <span style="color:#666666; font-size:0.5em;"> By <a href="https://www.thetimes.com/comment/columnists/article/we-keep-pumping-money-into-the-nhs-is-it-good-value-blq8bxc39" target="_blank" style="color:#000000; text-decoration:underline;">Tom Calver"</a> </span><br> <span style="font-size:0.7em; color:#333333;"> Karim K. Kardous <a href='mailto:kardouskarim@gmail.com' style='margin-left: 9px; font-size: 0.9em;'> <i class='bi bi-envelope'></i> </a> <a href='https://github.com/kkardousk' style='margin-left: 5px; font-size: 0.9em;'> <i class='bi bi-github'></i> </a> </span> </div>format: html: toc: true toc-depth: 4 toc-expand: true toc-title: 'Jump To' number-depth: 2 fig-format: retina fig-dpi: 300 code-link: true # requires both downlit and xml2 to be downloaded code-fold: true code-summary: '<i class="bi-code-slash"></i> Show the code' code-overflow: wrap code-tools: toggle: true # adds "Show All / Hide All"; also allows for all code copy (at once as quarto doc) css: styles.css highlight-style: github-dark df-print: paged page-layout: article embed-resources: true smooth-scroll: true link-external-icon: false link-external-newwindow: true fontsize: 1.1em linestretch: 0 linespace: 0 html-math-method: katex linkcolor: '#D35400'execute: echo: true warning: false message: false info: false cache: true freeze: autoeditor: visual---## Health Service and General Prelude {.text-justify}**About the data/article in a nutshell:** Tom uses a combination of sources to make his point → while not a bottom 'performer'; especially compared to the US, UK's National Health Service (NHS) did not wait for the COVID crisis to show its limits and harbor increasing consumer dissatisfaction.<br> That downward trend can be traced back to the early 2010's; and that comes from the service running lean - too lean, among other reasons.<br> For more detail on the sources, feel free to visit the actual article with link embedded at the top of the page in to author's name.<br>If you like to download the data, feel free to use urls [here](#metadata).**Overall Strategy for building first plot:** The data for first plot is smoothed out/interpolated data; this is usually done to 'well smooth' the data and delineate clearer trends over time; in this case select European countries' survey takers' scores on their overall satisfaction level towards their respective countries' public health services (ESS - European Social Survey).<br> In this case, there is an additional reason the author smoothed the data; it is to 'complete' the years given that the ESS is done once every two years; note that I am assuming here but it's not an outrageous assumption to make. So in order to match that smoothing, I go with `geom_smooth()` from `{ggplot2}` and keep `span` at default; after a few iterations; the data points (for non survey years) match highly to what Tom displays in the first graph.<br> Finally, since the graph is interactive, I use `{ggiraph}` package to emulate said interactivity; a JS based R package that lets you add tooltips/hover/highlight upon hover/downplay non-hovered, etc. all the usual things one expects from an interactive plot; without having to build a Shiny app; which for this exercise/first plot (and the rest); would be like building a Gatling gun to aim at an ant.**Steps taken:** <br> While I try to be as detailed in my comments as possible; it's still helpful to lay out the step by step process as a numbered list to get the overall chain of what the code is supposed to do on a high level - without having to go into the nitty gritty- the comments in the code chunks and in that regard should hopefully help:<br>1. Found the source of the data from HTML Source Page; clicked on Network tab after hovering on the plot panel; refreshed the page; and found 'dataset' under 'datawrapper'.2. The data was wide in structure (from raw csv): <br> 17 columns (1 column for year, 8 hex-coded columns with imputed/smoothed values, and 8 columns for country abbreviations with survey data for even years, NULL otherwise).3. Discovered the hex columns and country columns didn't align in a straightforward way. Columns were randomly ordered within each set, requiring a ranking approach rather than pairwise matching (one hex column to the symmetrical position of the country labeled column).4. Implemented a solution by sorting satisfaction scores per year, which helped group values by country through proximity of their scores. This approach works well since the values are interpolated through smoothing, making them very close to one another from row to row. This might not work in other cases, but it does here.5. Combined two sorted datasets: year + hex columns, and year + country columns to create a properly aligned mapping, joined facts data (with scores/values) on newly created mapped long datasets (converted from wide- almost always much harder to work with) to then finalize the dataset for visualization. More detail on 'finalized the dataset' can be found in the comments of the actual code. <br>Prepared the final clean dataset for interactive plotting with ggiraph.## Health Service Satisfaction```{r}#|echo: false#|message: false#|warning: false#|include: falseinvisible({# install pacman if it's not already installedif (!requireNamespace("pacman", quietly =TRUE)) install.packages("pacman")# install.packages("gdtools", type = "source")## NOTE; potential (might not be needed) steps on mac for registering then loading Roboto font into Quarto below## from terminal/shell# brew install cairo fontconfig freetype pkg-config# export PKG_CONFIG_PATH="/opt/homebrew/lib/pkgconfig:/opt/homebrew/share/pkgconfig"# export PKG_CFLAGS="-I/opt/homebrew/include"# export PKG_LIBS="-L/opt/homebrew/lib"## then from Rstudio# install.packages("gdtools", type = "source") pacman::p_load( xml2, downlit, gdtools, tidyverse, quarto, chromote, here, tidycensus, janitor, purrr, ggtext, ggshadow, ggiraph, gfonts, showtext, ggborderline, shiny, gt, rsvg, magick, stringr, ggimage ) showtext::showtext_auto(enable =TRUE) })# loading in the fonts from venv created that has the Roboto ones (for ploting); a widespread font used by The Times, The Economist, etc.tryCatch({ sysfonts::font_add(family ="roboto-regular", regular ="~/Library/Fonts/Roboto-Regular.ttf") sysfonts::font_add(family ="roboto-bold", regular ="~/Library/Fonts/Roboto-Bold.ttf") sysfonts::font_add(family ="roboto-condensed", regular ="~/Library/Fonts/Roboto_Condensed-Regular.ttf") sysfonts::font_add(family ="roboto-condensed-bold", regular ="~/Library/Fonts/Roboto_Condensed-Bold.ttf")}, error =function(e) {message("Fonts not available locally: ", e$message)})# used for most things within ggplot so we can assign them as default themes settheme_set(theme_minimal() +theme(text =element_text(family ="roboto-regular", size =11),plot.title =element_text(family ="roboto-bold", size =16),plot.subtitle =element_text(family ="roboto-regular", size =12),axis.text =element_text(family ="roboto-regular", size =10),axis.title =element_text(family ="roboto-condensed", size =11) ))```While above took care of importing required libraries and setting general options such as plot theme and text font to be used; below is the start of data related tasks; from initial pull, to wrangling, to finally output the visualizations. #### Data Pull```{r}#|echo: false#|message: false#|warning: false#|include: false#|eval: falseinvisible({ b <- ChromoteSession$new() b$Page$navigate("https://www.thetimes.com/comment/columnists/article/we-keep-pumping-money-into-the-nhs-is-it-good-value-blq8bxc39")Sys.sleep(3) # allow some time for dynamic content to render})# extract all iframe srcs (joined by || in this case)iframes_html <- b$Runtime$evaluate("Array.from(document.querySelectorAll('iframe')).map(el => el.src).join('||')")$result$value# split and filter valid Datawrapper url'schart_urls <-str_split(iframes_html, "\\|\\|")[[1]] |>str_subset("^https://datawrapper\\.dwcdn\\.net/[a-zA-Z0-9]+/\\d+$")all_data <- purrr::map_dfr(chart_urls, function(url) { message("Navigating to: ", url) b$Page$navigate(url)Sys.sleep(3) html <- b$Runtime$evaluate("document.documentElement.outerHTML")$result$value# match visible chart values if any pattern <-'aria-datavariables="year,\\s*([A-Z]+)".*?aria-datavalues="([0-9]{4}),\\s*([0-9.]+)"' matches <-str_match_all(html, pattern)[[1]]# match dataset.csv url as well csv_pattern <-"https://datawrapper\\.dwcdn\\.net/[a-zA-Z0-9]+/\\d+/dataset\\.csv" csv_link <-str_extract(html, csv_pattern)if (is.na(csv_link)) { csv_link <-str_glue("{url}/dataset.csv") }tibble(chart_url = url,country =if(nrow(matches)) matches[, 2] elseNA,year =if(nrow(matches)) as.integer(matches[, 3]) elseNA,value =if(nrow(matches)) as.numeric(matches[, 4]) elseNA,dataset_csv = csv_link )})# add a custom gt boilerplate -from {gt} package (great tables)-to reduce code redundancy (having to copy/paste same chunks of code every # time we turn a tibble into a gt object)gt_nyt_custom <-function(x, title ='', subtitle ='', first_10_rows_only =TRUE){ x <- x |>clean_names(case ='title') numeric_cols <- x |>select(where(is.double)) |>names() integer_cols <- x |>select(where(is.integer)) |>names() title_fmt <-if(title !="") glue::glue("**{title}**") else"" subtitle_fmt <-if(subtitle !="") glue::glue("*{subtitle}*") else"" x |> (\(x) if (first_10_rows_only) slice_head(x, n =10) else x)() |>gt() |>tab_header(title =md(title_fmt),subtitle =md(subtitle_fmt) ) |>tab_style(style =list(cell_text(color ='#333333') ),locations =cells_body() ) |>tab_style(style =list(cell_text(color ='#CC6600', weight ='bold') ),locations =cells_column_labels(everything()) ) |>fmt_number(columns =c(numeric_cols),decimals =1 ) |>fmt_number(columns =c(integer_cols),decimals =0 ) |>tab_options(table.font.names =c("Merriweather", "Georgia", "serif"),table.font.size =14,heading.title.font.size =18,heading.subtitle.font.size =14,column_labels.font.weight ="bold",column_labels.background.color ="#eeeeee",table.border.top.color ="#dddddd",table.border.bottom.color ="#dddddd",data_row.padding =px(6),row.striping.include_table_body =TRUE,row.striping.background_color ="#f9f9f9" )}```#### Metadata```{r}#|echo: false#|message: false#|warning: false# reveal dataset urls/csvsall_data |>count(url = chart_url, download_link = dataset_csv ) |>select(-n) |>gt_nyt_custom(title ='Dataset Ids' ) |>cols_label(Url ="Plot URL",`Download Link`="Link to CSV" ) |>tab_footnote("In the event you download the links yourself and run your own script, the third and last should be treated as tsv files, otherwise csv's" ) ```#### Data Sample ```{r}# reveal data sample for year 2004 as an exampleall_data |>filter(year ==2004) |>select(2:last_col()) |>gt_nyt_custom() |>tab_header(title =md("**Chart Data Summary**"),subtitle =md("*Extracted from embedded datawrapper in the HTML*") )```#### Data Wrangling```{r}#|echo: false#|message: false#|warning: false# for easier referencing, assign file types (csv where appropriate, tsv otherwise) file_info <-tibble(path =unique(all_data$dataset_csv),name =c("health_service_sat:csv", "value_for_money:csv", "room_to_improve:tsv", "barely_beds:csv", "budget_breakdowns:tsv") ) |>separate( name, into =c('dataset_name', 'file_type'), sep ="\\:" )# loop thru datasets, read them in, and then assign them to the global environmentinvisible({ file_info |>mutate(data =pmap(list(path, file_type), \(path, file_type) if(file_type =="csv") read_csv(path) elseread_tsv(path) ) ) |>select(dataset_name, data) |>deframe() |>list2env(envir = .GlobalEnv)})# set country 'switch'; so that tooltip can reflect full country name (spelled out) accordingly for imputed values (non survey years smoothed values)country_labels <-c(NO ="Norway", DE ="Germany", ES ="Spain", FR ="France", GB ="UK", IE ="Ireland", PT ="Portugal" )country_label_tibble <-c(NO ="Norway", DE ="Germany", ES ="Spain",FR ="France", GB ="UK", IE ="Ireland", PT ="Portugal" ) |>enframe()# clean the first dataset: health_service_sat, to prep for plotting.# it has 17 columns: 1 year column, 8 hex-coded columns (imputed/smoothed values), # and 8 columns for country abbreviations (survey data, even years).# initially assumed hex columns align pairwise with country columns; but it was not the case.s# columns are randomly ordered within each set, so we use ranking instead.# sorting by satisfaction score per year helps group values by country (via proximity of their scores).# we then combine two sorted datasets: year + hex columns, and year + country columns# note that this might not always be the go-to solution but in this case, # and given that the values are interpolated (through smoothing), we can safely bet that the values will be very close to one another# from one row to the nexthex_to_country_mapping <- health_service_sat |>pivot_longer(-year ) |>slice_max(year) |>filter(str_starts(name, '\\#') &!str_detect(name, 'A9FF') # looking at last values from article curves, we can infer this is Italy so ) |>arrange(value) |>bind_cols( health_service_sat |>pivot_longer(-year ) |>slice_max(year) |>filter(!str_starts(name, '\\#') &!str_detect(name, 'IT') ) |>arrange(value) ) |>select(years =1, hex_code =2, second_to_last_val =3, years_max =4, country_abb =5, last_val =6 ) |>mutate(val_diff =abs(last_val - second_to_last_val) ) |>arrange(val_diff) |>select( hex_code, country_abb ) |># also join to country_label_tibble to get full country names for future useinner_join( country_label_tibble, join_by(country_abb == name) )# now we can map the randomly assigned hex value labels to the actual columns/countries, and create 8 series,# one for each countryhealth_service_sat <- health_service_sat |>pivot_longer(-year ) |>left_join( hex_to_country_mapping, join_by(name == hex_code) ) |>mutate(country_abb =coalesce(country_abb, name) ) |>inner_join( hex_to_country_mapping, join_by(country_abb == country_abb) ) |>select( year, country_abb,country = value,value = value.x ) |>drop_na() extract_smooth_build <-function(tibble, country ='GB'){ initial_pull <- all_data |>filter(country %in% {{country}}) |>ggplot(aes(x = year, y = value)) +geom_smooth(method ='loess')# fetch country abbs for ids, and ranges country_ids <-c(na.omit(all_data |>pull(country) |>unique())) country_max <- all_data |>filter(country == {{country}}) |>pull(value) |>max() country_min <- all_data |>filter(country == {{country}}) |>pull(value) |>min()# access smoothed, include actual years to imputed/smoothed points, cap at min max per country/series# and keep only columns of interest smoothed_df <-ggplot_build(initial_pull)[[1]] |>as.data.frame() |>as_tibble() complete_series <- smoothed_df |>select(year = x, value = y) |>mutate(country := country) |>bind_rows( all_data |>filter(country == {{country}}) |>select(year, value) ) |>mutate(year =as.integer(year),year_val_tie_breaker =if_else(is.na(country), 1, 0) ) |>group_by(country, year) |>arrange(desc(year_val_tie_breaker)) |>mutate(ties =row_number()) |>filter(if (n() <4) TRUEelse ties + year_val_tie_breaker !=1# make sure every year/country combo gets same no. of obs# and that original values (only in the event a given year is even or survey year) take precedence over smoothed ones# otherwise just pass/do nothing ) |>ungroup() |># ensuer smoothed values don't go below/beyond lower/upper boundsmutate(value =pmin(pmax(value, country_min), country_max) ) |>arrange(year) |>fill(country, .direction ='downup') |># since every year starts with select(year, country, value) return(complete_series)}# country vector to loop thrucountry_name_abbs <-c(na.omit(all_data |>pull(country) |>unique()))# combine all seriesall_series <-map_dfr(.x = country_name_abbs, ~extract_smooth_build(tibble = all_data, country = .x))# set contry 'switch; so that tooltip can change accordingly for odd numebred yearscountry_labels <-c(NO ="Norway", DE ="Germany", ES ="Spain",FR ="France", GB ="UK", IE ="Ireland", PT ="Portugal")# adding year as continuous variable (decimal years) so that points don't overlap but strech over whithin a year to year spanall_series <- all_series |>mutate(rn =row_number(), .by =c(country, year) ) |>mutate(decimal_year =if_else(rn ==1, year, year + rn /8) ) |>mutate(year = decimal_year ) |>select(-decimal_year)# also join on country full name mapping so we can generate a consolidate data_id that links the aestethics together (for interactive simultaneus highlighting, etc.)all_series <- all_series |>inner_join( country_labels |>enframe() |>rename(values = value), join_by(country == name) ) |>mutate(data_id =str_c(country, values) ) |>select(-values) |>mutate(country_name =str_sub(data_id, 3, 20) )# also generate visible (and non visible years by exclusion) as they don't visually get the same properties; visible (even numbered years) get the country abb as a tooltip (and larger markers/circles), # while 'invisible' ones (odd numbered years along with year 2023) get their country names fully spelled out and get transparent marker/circle fillvisible_years <-c(seq(2002, 2022, 2), 2023)visible_points <- all_series |>filter(round(year) %in% visible_years &floor(year) ==ceiling(year))invisible_points <- all_series |>filter(!round(year) %in% visible_years &floor(year) !=ceiling(year))# final touchups# set color mappingscolor_map <-expr(case_when( country %in%c('NO', 'Norway') ~'#d43b45', country %in%c('DE', 'Germany') ~'#DCA825', country %in%c('ES', 'Spain') ~'#b01622', country %in%c('FR', 'France') ~'#487caa', country %in%c('GB', 'UK') ~'#264250', country %in%c('IE', 'Ireland') ~'#61A861', country %in%c('PT', 'Portugal') ~'#d27e4e',TRUE~'#000000' ))# set tooltip mappingstooltip_map <-expr(case_when(!year %in%c(seq(2002, 2022, 2), 2023) & country %in%names(country_labels) ~ country_labels[country],TRUE~ country ))label_data <- all_series |>group_by(country) |>arrange(desc(year)) |>filter(row_number() ==1) |>mutate(y_offset =case_when( country =='ES'~ value + .1, country =='FR'~ value +0, country =='DE'~ value - .05, country =='GB'~ value - .1, country =='PT'~ value + .2,TRUE~ value) ) |>ungroup() |>mutate(country_name =case_when( country =="DE"~"Germany", country =="ES"~"Spain", country =="FR"~"France", country =="GB"~"UK", country =="IE"~"Ireland", country =="NO"~"Norway", country =="PT"~"Portugal",TRUE~NA_character_ ),country_color =case_when( country %in%c("DE", 'Germany') | country_name %in%'Germany'~"#9b6e00", # override DE/Germany label color here since curve color is different than country label color (only one) country %in%c('NO', 'Norway') ~'#d43b45', country %in%c('ES', 'Spain') ~'#b01622', country %in%c('FR', 'France') ~'#487caa', country %in%c('GB', 'UK') ~'#264250', country %in%c('IE', 'Ireland') ~'#61A861', country %in%c('PT', 'Portugal') ~'#d27e4e',TRUE~'#000000' ) ) |>inner_join( country_labels |>enframe() |>rename(values = value), join_by(country == name) ) |>mutate(data_id =str_c(country, values),country =if_else(country =='DE', 'Germany', country) )# add caption to match Tom'scaption_text <-"<span style='color:#232323;'>0 = extremely bad, 10 = extremely good</span><br><span style='color:#939291; font-weight: normal;'>Chart: Tom Calver | The Times and The Sunday Times • Source: ESS/K. Kardous</span>"p <- all_series |>distinct() |>ggplot(aes(x = year, y = value, group = data_id,color = country) ) +scale_color_manual(values =c('NO'="#d43b45",'DE'='#DCA825','ES'='#b01622','FR'='#487caa','GB'='#264250','IE'='#61A861','PT'='#d27e4e') ) +scale_y_continuous(breaks =seq(0, 7, 1), limits =c(0, 8) ) +scale_x_continuous(breaks =seq(2002, 2022, 2), limits =c(2002, 2025),expand =c(0, 0.1) ) +theme(legend.position ='none',panel.grid.major.x =element_blank(),panel.grid.minor.x =element_blank() ) +geom_smooth_interactive(data = all_series,aes(x = year, y = value, data_id =paste0(country, country_name)),method ="loess",se =FALSE,linewidth =3.5, # thick line acts as the 'border'alpha =1,show.legend =FALSE,color ="white" ) +# colored interactive smooth linegeom_smooth_interactive(data = all_series |>filter(!country %in%'IE'),aes(data_id =paste0(country, country_name)),method ="loess", se =FALSE, linewidth =0.9, fill =NA,show.legend =FALSE ) +geom_smooth_interactive(data = all_series |>filter(country %in%'IE'),aes(data_id =paste0(country, country_name)),method ="loess",se =FALSE, linewidth =0.9, fill =NA, show.legend =FALSE ) +scale_y_continuous(breaks =seq(0, 7, 1), limits =c(0, 8) ) +scale_x_continuous(breaks =seq(2002, 2022, 2), limits =c(2002, 2025),expand =c(0, 0.1) ) +labs(x =NULL,y =NULL,caption = caption_text ) +# final touchoups before interactive rendering thru girafe()theme(panel.spacing =unit(20, 'cm'),plot.margin =margin(l =5, b =10), # leave some space/margin at the bottom for caption 'room to breathe'axis.text =element_text(face ="bold"), # axis tick labelsstrip.text =element_text(face ="bold"), # facet labelspanel.grid.major.x =element_blank(),axis.text.x =element_text(margin =margin(b =10, t =-10)),panel.grid.major.y =element_line(color ="gray90"),axis.ticks.x =element_blank(),plot.caption =element_markdown(family ="Roboto",face ='bold' ) ) p_interactive <- p +geom_point_interactive(data = visible_points,aes(x = year,y = value, color = country,data_id =paste0(country, country_name) ),alpha =0.1, fill ='white', show.legend =FALSE ) +geom_point_interactive(data = all_series |>mutate(point_size =if_else(country %in%c('NO', 'Norway', 'PT', 'Portugal'), 3, 1.5),point_stroke = point_size ),aes(x = year, y = value,data_id =paste0(country, country_name),tooltip =paste0("<div style='text-align:", if_else(year <=2015.250, "left", "right"), "; line-height: 1.1;'>", # tightens spacing"<div style='font-weight:bold; font-size:16px; color:",if_else(country_name =="Germany", "#9b6e00", eval(color_map)), ";'>", eval(tooltip_map), "</div>","<div style='font-size:16px;'>", round(year, 0), "</div>","<div style='font-size:16px;'>", round(value, 2), "</div>","</div>" ) ),color ='white',fill ='white', shape =21, alpha =0 ) +geom_rect(inherit.aes =FALSE,aes(xmin =2024, xmax =Inf, ymin =-Inf, ymax =Inf),color =NA, fill ="white" ) +scale_color_manual(breaks =c("GB", "FR", "IE", "PT", "ES", "NO", 'DE', 'Germany'), # this makes sure Germany as a country label gets dark yellow while 'DE' as a curve gets a darker yellow colorvalues =c("#264250", "#487caa", "#61A861", "#d27e4e", "#b01622", "#d43b45", '#DCA825', '#9b6e00') # this makes sure Germany as a country label gets dark yellow while 'DE' as a curve gets a darker yellow color ) +# scale_color_identity() + # correctly apply the country color to the label's font# coord_cartesian(xlim = c(2002, 2024.5)) +theme(panel.grid.major.x =element_blank(),panel.grid.minor.x =element_blank() ) +# add persistent white circle that follows mouse (via selection); one to cover all data points is simply assigning data to country.year combogeom_point_interactive(data = all_series,aes(x = year,y = value,group =paste0(year, country_name) ),shape =21,size =0.4,stroke =1,fill ='white',color ="grey85",alpha =0,show.legend =FALSE ) ```#### Data Visualization ```{r}#|echo: false#|message: false#|warning: falsep_ggraph_ready <- p_interactive +theme_minimal() +theme(panel.grid.major.x =element_blank(),panel.grid.minor.x =element_blank(),legend.position ='none' ) +labs(caption = caption_text ) +theme(plot.caption =element_markdown(lineheight =1.2,hjust =-0.01,margin =margin(t =10, l =-7, r =1),halign =0 ),axis.text.x =element_text(margin =margin(t =-7, l =3, b =7, r =-3)) ) +geom_segment(aes(x =2002, xend =2023, y =0, yend =0), color ='black', linewidth = .1 ) +geom_label_interactive(data = all_series |>bind_rows( all_series |>slice_max(year) |>arrange(desc(value)) |>mutate(country_name =str_sub(data_id, 3, 20),year =2024,value =c(c(7, 5.8), c(5.7, 5.4, 5.1, 4.8, 4.5) - .3) ) )|>slice_max(year) |>mutate(year =2023.4),aes(x = year,y = value,group =paste0(country, country_name),label =c("Norway", "Spain", "France", "Germany", "UK", "Ireland", "Portugal"),data_id =paste0(country, country_name) ),label.size =NA,fill =NA,color =c("#d43b45", "#b01622", "#487caa", "#9b6e00", "#264250", "#61A861", "#d27e4e"),size =3.3,hjust =0,vjust =-.2,# fontface = 'bold',inherit.aes =FALSE,alpha =1 ) girafe(ggobj = p_ggraph_ready,options =list(opts_tooltip(css =" background: transparent; border: none; box-shadow: none; font-family: sans-serif; text-shadow: 0 0 4px rgba(234, 255, 255, 1), 0 0 4px rgba(234, 255, 255, 1), 0 0 4px rgba(255, 255, 255, 1); line-shadow: 0 0 4px rgba(234, 255, 255, 1), 0 0 4px rgba(234, 255, 255, 1), 0 0 4px rgba(255, 255, 255, 1); border-radius: none; transform: translate(-50%, 20px); transition: all 0.2s ease-in-out;",delay_mouseover =300,delay_mouseout =200 ),opts_hover(css ="stroke-width: 3; stroke-opacity: 0.9; fill-opacity: 0.9; opacity: 1;",nearest_distance =30,reactive =FALSE ),opts_hover_inv(css ="stroke-width: .2; stroke-opacity: 0.2; fill-opacity: 0.2; opacity: 0.4;" ) ))```::: {.text-justify}**Notes on Above Plot:** This graph looks deceptively simple at first glance; but was in fact by far the hardest to replicate out of all five. <br> _Hallow Circles/Markers_: I'm not sure what software Tom uses to render the interactive plots, but it was much harder than expected to replicate the persistent hollow circle marker that moves along each curve and 'links' the tooltip to the marker using a small vertical tick. It wasn't for a lack of trying but I believe current `{ggiraph}` framework (I might be wrong) doesn't natively support said functionality.<br>_White Glow around Curve Borders_: Another feature I wanted to implement/match with Tom's original is the subtle white glow around the curves. Two functions (at least) derived from packages allow for this; `geom_glowline()` & `geom_borderline()` from the `{ggshadow}` & `{ggborderline}` packages, respectively; both of which 'get disabled' when being called in a `{ggiraph}` framework.<br>_Hovering on a Curve vs. a Point_: Upon hovering on a curve, original attenuates the points that make up the curve to a maximum, to only display the curve (devoid of any points shown); mine, while making other curves go far in the background, doesn't completely eliminate the points that make up the said hovered curve because both points and curve roll up to to the same `data_id`; an added layer from the `{ggplot2}` extension in `{ggiraph}` which links in this case said points to the curve (and country labels together).<br> There might be a way for that uncoupling (between points and their curves) to happen after the fact, after the hover; but after trying arduously, I did not find it. Feel free to reach out or do a pull request to suggest an improvement (here or otherwise). <br>Having said all that, I believe the rest remains faithful to the original throughout.:::## Value for Money**Notes on Below Plot:**```{r}#|echo: false#|message: false#|warning: false#|include: falsedata <- value_for_money |>drop_na(country) |>filter(!country %in%c('Norway', 'Australia') ) |>mutate(last_year = year ==2023,country_tooltip =paste(country, year, sep =', ') ) |>arrange(country, year) |>mutate(country_fill =case_when(str_detect(country_tooltip, "US") ~"US",str_detect(country_tooltip, "France") ~"France",str_detect(country_tooltip, "Italy") ~"Italy",str_detect(country_tooltip, "Germany") ~"Germany",str_detect(country_tooltip, "Canada") ~"Canada",str_detect(country_tooltip, "Japan") ~"Japan",str_detect(country_tooltip, "UK") ~"UK",TRUE~ country_tooltip ),country =factor(country, levels =c("Canada", "France", "Germany", "Japan", "Italy", "UK", "US")) )p2 <- data |>ggplot(aes(x = spend, y = le, color = last_year, fill = country_fill, group = country_fill)) +geom_point_interactive(aes(size = size, data_id = country_fill, tooltip = country_tooltip),shape =21, alpha =1 ) +geom_text_interactive(data = data |>slice_max(year) |>distinct(country_fill, .keep_all =TRUE),aes(text = country_fill,label = country_fill,data_id = country_fill,tooltip = country_tooltip ),hjust =-0.3, vjust =0, alpha =1 ) +scale_fill_manual(breaks =c("US", "France", "Italy", "Germany", "Canada", "Japan", "UK"),values =c("#4076A4", "#80B1E2", "#61A961", "#F5C55E", "#FFAEA9", "#DACFC0", "#E94F55") ) +scale_color_manual(breaks =c(FALSE, TRUE),values =c('white', 'black') ) +theme(plot.title =element_markdown(size =12, lineheight =1.2, linewidth =1.5),plot.subtitle =element_markdown(size =12, lineheight =1.2) ) +labs(title ='**Value for money**',subtitle ="How life expectancy and per-capita healthcare spend have changed since 2000.<br> <span style='background-color:#e94f55; color:white; padding:2px 4px;'>**UK**</span> spending is rising, but life expectancy has stalled." ) +labs(x =NULL, y =NULL) +scale_x_continuous(breaks =seq(3000, 11000, 1000),labels =c(format(seq(3000, 10000, 1000), big.mark =",", trim =TRUE), "$11,000") ) +coord_cartesian(xlim =c(2100, 11300),ylim =c(77, 86),expand =FALSE,clip ='off' ) +# add caption for p2labs(caption ="<span style='color:#232323; font-weight:bold;'>In US Dollars, adjusted for purchasing power and inflation. Excludes 2020-22.</span> <br><span style='color:#989799; font-weight:bold;'>Chart: Tom Calver | The Times and The Sunday Times</span>" ) +theme(text =element_text(family ='Roboto', color ='black', face ='bold'),panel.grid.minor =element_blank(),panel.grid.major.y =element_blank(),panel.grid.major =element_line(size =0.3, color ="grey80"),axis.line =element_line(color ="black", size =0.3),legend.position ='none',plot.caption =element_markdown(size =10,hjust =0,lineheight =1.2,face ='plain' ) ) +annotate(geom ='rect',xmin =2075,xmax =2345,ymax =86.5,ymin =86.15,fill ='#e94f55' ) +# we also need to annotate the years 2000 and 2023 with Germany's yellow hex code (to match what Tom has)# not so much for Germany but for reference in general to the range of years for the plot# 2000 persistent text geom; for 2023 we use text geom; for 2000, we use label with no borders to bring forward '2000'annotate(geom ='label',label ='2000',x =4250,y =77.97,color ='#F5C55E',fill ='white',label.size =NA,fontface ="bold" ) +# 2023 persistent text geomannotate(geom ='text',label ='2023',x =6400,y =81.2,color ='#F5C55E',fontface ="bold" ) +# add x and y axes titles (within the plot itself)# y axisannotate(geom ='text',label ='Life expectancy',x =2685,y =85.8,color ='#7B7B7B',fontface ="bold",fontfamily ='Roboto',fontsize =15 ) +# x axis; i couldn't get the text to right justify for x axis title 'Per Capita\n spend' even after using hjust = 1, so i split that text in two lines and that works/matches Tom's annotate(geom ='text',label ='Per-capita',x =11200,y =77.5,color ='#7B7B7B',fontface ="bold",fontfamily ='Roboto',fontsize =15,hjust =1,vjust = .6 ) +annotate(geom ='text',label ='spend',x =11200,y =77.2,color ='#7B7B7B',fontface ="bold",fontfamily ='Roboto',fontsize =15,hjust =1,vjust = .6 )girafe(ggobj = p2,width_svg =10, height_svg =6,options =list(opts_tooltip(css ="background: white; border: 1px solid #ddd; border-radius: 4px; padding: 6px; font-size: 14px; font-family: Roboto; font-weight: bold; color: #232323; text-align: left; box-shadow: 2px 2px 5px rgba(0, 0, 0, 0.1);" ),opts_hover(css ="stroke-opacity: 1; fill-opacity: 1; color: #232323; font-size: 12px; alpha: 1;" ),opts_hover_inv(css ="fill-opacity: 0.01; stroke-opacity: 0.01; color: transparent; font-size: 0.1px;" ) ))```#### Data Sample Below is a sample of the data